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Abstract —Learning from synthetic data has many important 
and practical applications, An example of application is photo¬ 
sketch recognition. Using synthetic data is challenging due to 
the differences in feature distributions between synthetic and 
real data, a phenomenon we term synthetic gap. In this paper, 
we investigate and formalize a general framework - Stacked 
Multichannel Autoencoder (SMCAE) that enables bridging the 
synthetic gap and learning from synthetic data more efficiently. In 
particular, we show that our SMCAE can not only transform and 
use synthetic data on the challenging face-sketch recognition task, 
but that it can also help simulate real images, which can be used 
for training classifiers for recognition. Preliminary experiments 
validate the effectiveness of the framework. 

I. Introduction 

Modern supervised learning algorithms need plenty of data 
to help train classifiers. More data with higher quality is 
always desired in real-world applications; but sometimes, it 
is beneficial to turn to synthetic data. Eor example, to help 
identify criminals, many criminal investigations can only rely 
on a synthetic face sketch rather than a facial photograph of 
a suspect which may not be available. Such synthetic face 
data is normally drawn by an expert based on descriptions of 
eyewitnesses and/or victim(s). Several photo-sketch examples 
are shown in Eig. In this application, recognition based on 
synthetic data is very crucial. 

Directly using synthetic data in a learning algorithm is 
unfortunately very challenging since synthetic data is different 
from real data at least to some extent, e.g. exaggerated facial 
shapes in sketch images in Eig. as compared with real 
images. As a result, the feature distributions of synthetic data 
may be shifted away from those of real data as illustrated in 
Eig. We term such shift in distributions as synthetic gap. 
Synthetic gap is largely caused by the generating process of 
synthetic data: whereas the synthetic data are generated by 
replicating principal patterns such as eyes, mouth, nose and 
hairstyle, rather than replicating every detail of real data. The 
synthetic gap is a major obstacle in using synthetic data in 
recognition problems, since synthetic data may fail to simulate 
potentially useful patterns of real data which are important to 
a successful recognition. To solve this problem, we associate 
synthetic data with real data, and jointly learn from them in a 
Stacked Multichannel Autoencoder (SMCAE) which can help 


bridge the synthetic gap by transforming characteristics of 
synthetic data to better simulate real data. 
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Fig. 2. t-SNE visualization ED of the distribution of Histogram of Oriented 
Gradients (HOG) features in the data in CUFSF dataset [sT], (21 • Left: 
synthetic gap is observed between photo and sketch features; Right: the 
synthetic gap is bridged by our SMCAE. 

This paper addresses the problem of learning a mapping 
from synthetic data to real data. Specifically, we propose a 
novel framework - SMCAE. The training process of SMCAE 
facilitates the bridging of the synthetic gap between the real 
and the synthetic data by learning how to transform: (1) 
synthetic to real data and (2) real to real data. In (2) the 
model learns most essential ‘characters’ and ‘patterns’ of real 
data, while in (1) it learns how to augment the synthetic data 
to best reproduce the distribution of real data. Because the 
two tasks are learned simultaneously, with shared parameters, 
the essential ‘characteristics’ learned in (2) help to regularize 
results in (1) and vice versa as we will illustrate in the 
Handwritten Digit experiments. 

We highlight two main contributions of this paper: (1) 
To the best of our knowledge, this is the first attempt to 
address the problem of synthetic gap, by demonstrating that 
the synthetic data could be used to improve the performance 
on a recognition task. (2) We propose a Stacked Multichannel 
Autoencoder (SMCAE) model to bridge the synthetic gap and 
jointly learn from both real and synthetic data. 





Fig. 1. Examples of face photos and sketches. Data comes from the CUFSF dataset ED, Ea 


II. Related Work 

Transfer Learning aims to extract the knowledge from one, 
or more, source tasks and apply it to a target task. Transfer 
learning can be used in many different applications, such as 
web page classification 1^ and zero-shot classification ca. 
A more detailed survey of transfer learning is given by [TSl. 
Our method is a specific form of transfer learning, termed 
domain adaptation lO, (321, (SI. Nonetheless, different from 
previous domain adaptation approaches, we assume the the 
synthetic gap is caused by the shift in feature distribution 
of synthetic data from real data and so we assume that the 
main ’characters’ and ’patterns’ strongly co-exist in both the 
synthetic and real data. Our SMCAE is thus developed based 
on this assumption. 

Autoencoder is a special type of a neural network where the 
output vectors have the same dimensionality as the input vec¬ 
tors (^ . Autoencoder with its different variants Co), ca, 0, 
1^ was shown to be successful in learning and transferring 
shared knowledge among data source from different domains 
0,0, im, and thus benefit other machine learning tasks. 
Our framework borrows the idea of autoencoder to jointly 
learn two different and yet related tasks: mapping synthetic 
to real data; and real to real data. It is worth noting that in 
Ea, a multimodal autoencoder with structure similar to ours 
is proposed. Their multimodal autoencoder put two normal 
autoencoders together by sharing a hidden layer. In their 
structure, data at input end and output end are fully symmetric 
and each modal of data occupy one branch of the antuencoder. 
In contrast to their structure, the proposed SMCAE composes 
the structure of both normal autoencoder and denoising au¬ 
toencoder. With this composition, one branch of SMCAE is 
capable exploring intrinsic features of data in one domain, and 
another branch of SMCAE is going to transfer data from one 
domain to another domain using features discovered from both 
branches. The structure of SMCAE could be easily expanded 
to more branches to compensate more complicated multi-task 
learning problems. Our experiments show that our SMCAE is 
better than other autoencoders in this regard. 

Learning from synthetic templates. Some recent works of 
learning from synthetic data Ga, Ea, (a mostly generate 
synthetic data either by applying a simple geometric transfor¬ 
mation or adding image degradation to real data. To help of¬ 
fline recognition of handwritten text (26l . (271, a perturbation 
model combined with morphological operation is applied to 
real data. To enhance the quality of degraded document (4l, 
degradation models such as brightness degradation, blurring 
degradation, noise degradation, and texture-blending degrada¬ 


tion, were used to create a training dataset for a handwritten 
text recognition problem. These methods did not address the 
synthetic gap problem, and thus have been limited to a small 
performance improvements by using synthetic data. In (T9l , 
computer graphics 3D models are used to ease training data 
generation. To simulate pedestrian in a picture, authors track 
volunteers pose from multiple views and human bodies are 
reshaped using a morphable 3D human model. The reshaped 
picture of human bodies later are composed with real world 
backgrounds. The same idea has been adopted in (23l where 
in addition to render a 3D model to simulate an object in a 
real scene, features extracted from synthetic data are adapted 
to better train an object detector. 

III. Stacked Multichannel Autoencoder (SMCAE) 

We propose the SMACE model to learn a mapping from 
synthetic and real data. To learn this mapping, the SMCAE 
model is formulated as a stacked structure of multichannel 
autoencoders which facilitates an efficient and flexible way 
of jointly learning from both synthetic and real data. The 
structure and configuration of the SMCAE is illustrated in Eig. 

Specifically, we set the left and right tasks in two channels 
of the SMCAE respectively. The left task, as illustrated in left 
channel of Eig.[^ takes synthetic data as input and real data as 
reconstruction target; while the right task of the right channel 
in Eig. louses real data in both input and reconstruction target. 
All between-layer connections that are colored in gray are 
shared by tasks of the two channels. The SMCAE structured 
in this way attempts to transform synthetic data to real data in 
left task using representation learned from real data in right 
task. 


Architecture of SMCAE Training of i-th layer 



Fig. 3. (left) Illustration of the SMCAE: black edges between two layers are 
linked to and shared by two tasks; red and blue links are separately connected 
to the left and right task respectively, (right) A zoom-in structure of SMCAE 
with single hidden layer. 































A. Problem setup 

We first illustarte the setup of a single layer in each channel 
of our SMCAE. For a single channel of our SMCAE is 
basically an autoencoder f7lll28l. Assume an input dataset with 
n instances X = where Xi G To encode the input 

data, we have he{xi) = f{Wixi-\-bi) where /(•) is a sigmoid 
function and Oe = G G is a set 

of encoding parameters in j-th layer. In contrast, the decoding 
process is defined as hd{xi) = f(W^he{xi) + b^^) with the 
decoding parameters Od = G G 

and the encoded representations he{xi). 

To minimize the reconstruction error, we have 

1 "" 

J{ee,ed) = -^{hd{xi)-xif + xw^ ( 1 ) 

1=1 

where + is a 

weight decay term added to improve generalisation of the 
autoencoder and A leverages the importance of this term. To 
avoid learning the identity mapping in the autoencoder, a 
regularisation term 0 = <^log|^ + (1 ~ 

penalizes over-activation of the nodes in the hidden layer is 
addec 0 Si is an averaged activation of all nodes in the hidden 
layer and is computed as: Si = ^ he{xi). Thus the 
objective of single channel is updated to: 


The regularization term of is a novel contribution of 
our SMCAE. Basically, penalizes a situation where the 
difference of learning errors between two channels are large. 
Since in the configuration of the SMCAE the data at the input 
and output end of two channels are not symmetric, the learning 
error resulted by optimizing learning process in two channels 
are very different. Having in our objective will prevent from 
a situation where the optimization of one channel dominates 
the entire SMCAE so as to help SMCAE to better leverage 
the learning process and find a compromising balance between 
two channels. For importance of in our objective, we show 
the learning results of setting different 7 for in Fig. 

The minimization of Eq. is achieved by back propagation 
and stochastic gradient descent using a Quasi-Newton method 
- LBFGS. In the SMCAE, with balance regularization added 
to the objective, the only difference as opposed to sparse au¬ 
toencoder is the gradient computation of unknown parameters 
Oe and We clarify these differences in the following 

equations: 
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J(0e, Od) = - y{hd{xi) - Xi)^ + + pO (2) 

i=l 

where p controls sparsity of representation in hidden layer. 

B. The SMCAE model 

The structure of the SMCAE model is extended from an 
autoencoder so that it can simultaneously deal with tasks 
in both the left and right channels. Specifically, we use the 
notation (i:A, o\X) to denote the configuration of input data 
(short for i) and reconstruction target at the output layer (short 
for 0 ) in one channel of SMCAE. We thus label the tasks 
in the left and right channels of SMCAE as o:X^)^ 

and (i:X^, o\Xr)^ individually, where (•)^ and (•)^ indicate 
the left and right channel branch of SMCAE. Xs, X^ stand 
for synthetic and real data respectively. The tasks in the two 
channels share the same parameters Oe in all hidden layers 
which enforces the autoencoder to learn common structures 
of both tasks. At the output layer, we divide the SMCAE into 
two separate channels with their own parameters 0^ and 0^. 

Our target is to minimize the reconstruction error of the 
two tasks of SMCAE together while taking into account the 
balance between two channels. The new objective function of 
SMCAE is thus, 

E = j^{o,,e^) + j^{ee,o^) + -i'i> (3) 

We add 4/ = \{J^{0e,9^) — J^{de,0^))^ as a regularisation 
term to balance the learning rate between the two channels. 

^ (5 is a sparsity parameter and is empirically set to 0.05 in all our 
experiments. 
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We train a SMCAE in a greedy manner where one layer 
gets trained at a time. The configuration for training one layer 
of SMCAE is shown in Fig. (fright). The output of a trained 
layer is then sent as input to the next layer for training. A 
fine-tuning is implemented to the entire stacked structure once 
all layers are trained. Thus, after SMCAE has been trained, 
to transform new synthetic data, the data is sent to the left 
channel of the SMCAE o:X^)^. We take output of this 

process as transformed synthetic data. 


C. Competitors 

As shown in Fig. we compare the SMCAE configuration 
to three alternative configurations: (1) SMCAE-II which places 
two separate channels on the structure, i.e. O'.Xg)^ and 

(i:X^, O'.Xj)^. (2) Stacked autoencoder type-I (SAE-I) which 
merges the tasks in a single channel stacked autoencoder, 
with the configuration of :(i:AsX^, o:X^A^). (3) Stacked 
autoencoder type-II (SAE-II) which simply transforms source 
data to target data, and configures as: (i:Xs, o:X^). 


















Compared with SAE-I and SAE-II, our two channel struc¬ 
tures endow more flexibility. Critically, the single channel 
models force synthetic data to fit real data, which causes 
synthetic data to lose information and become less useful 
for recognition. In contrast, SMCAE can explore ‘characters’ 
and ‘patterns’ common in both synthetic and real data. In¬ 
trinsically, SMCAE first encodes both synthetic and real data 
into common hidden layers which model common information 
useful for recognition. Then the decoding process transforms 
the synthetic data to better simulate real data. Although 
SMCAE-II has the same two branches in the structure, it does 
not learn such transformation between synthetic data and real 
data. 


evaluation metric and proposed in ED. (3) Rank-1 recognition 
accuracy. 

Features.(l) Similar to (1411 . in the CUESE dataset we use 
Histogram of Oriented Gradients (HOG). To further reduce the 
computational cost, the resolution of all photos and sketches 
is reduced to 50 x 50. So the cell size of HOG features is set 
to 3. (2)The HWDUCI dataset uses HOG features with cell 
size 3. 

Classifiers. Eor CUESE dataset, nearest-neighbor search with 
Euclidean metric is used in retrieving the most similar photo 
to the query sketch. In the handwritten digit classification, a 
Support Vector Machine (SVM) with RBE kerne|^ is used in 
the experiments. 
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Fig. 4. Illustration of the compared configurations: SMCAE, SMCAE-II, 
SAE-I and SAE-II. 


IV. Experiments and Results 

We first compare SMCAE on the challenging task of face- 
sketch recognition (311 . (33]| using the CUESE dataset. We 
show that SMCAE is better than alternative configurations. 
To further validate the efficacy of our framework, we train 
SMCAE on handwritten digit images and generate synthetic 
data to simulate real images. We show that the synthetic data 
can help train classifiers for recognition. 

Dataset. We conduct our experiments on two different datasets: 
(1) The CUESE dataset (TQ, (33l containing the photos and 
sketches of 1194 people with lighting variations. We employ 
the standard split defined in (311 . (^ which selects 500 
persons as the training set, and the remaining 694 persons 
as the testing set. (2) handwritten digits datasej^ (HWDUCI) 
containing 5620 instances in total in which 3823 samples are 
used for training and 1917 samples are used for testing. The 
handwritten digits from 0 to 9 in this dataset are collected 
from 43 people: 30 contributed to the training set and the 
other 13 to the test set. Eor all experiments, we empirically 
set the number of hidden layers in SMCAE to two and each 
layer has 1000 nodes. The same settings are used to make 
SMCAE, SMCAE-II, SAE-I and SAE-II more comparable. 
Evaluation Metrics. We report the following metrics when 
they are available: (1) El-score, which is defined as FI = 
2 • {Precision • Recall) / {Precision Recall). (2)Receiv- 
ing Operator Characteristic (ROC) curves and VR@0.1%EAR 
which is the performance of Verification Rate (VR) at 0.1% 
Ealse Acceptance Rate (EAR). VR@0.1%EAR is a standard 


A. Results on the CUFSF dataset 

In all experiments on this dataset, HOG features of sketch 
images are first transformed by the SMCAE and then used as 
queries. We first compare the results of photo-sketch matching 
using HOG feature transformed by SMCAE, SMCAE-II, SAE- 
I and SAE-II. The results are reported as ROC curve starting 
with VR@0.1%EAR. The dissimilarity between a photo and 
a sketch is computed as the Euclidean distance between 
descriptors. 



Fig. 5. Results on CUFSF dataset. Left: ROC curve of different methods; 
Right: VR@0.1%FAR of different methods. 


The ROC curves and VR@0.1%EAR are shown in 
Eig. Clearly, the proposed SMCAE achieves the high¬ 
est results on AUC values and VR@0.1%EAR accuracy 
and significantly outperforms the alternative configurations. 
Note that we also report the state-of-the-art approaches of 
VR@0.1%EAR including LEDA (H, CITE (33l and classic 
eigenfaces(PCA)(24l. It is worth noting that in some of 
previous works, a better result could be obtained by combining 
multiple features. Eor example in (33l, multiple CITE features 
generated by a random forest are used to batter matching 
photos and sketches. Here, to enable a comparison with 
more fairness, we focus our comparison on matching results 
obtained by using uncombined feature only. 

There are several reasons why our SMCAE outperform 
the other approaches. Eirst, compared with SMCAE-II, the 
configuration of SMCAE involves a task that handles the 
transformation from synthetic to real data, and thus better 
eliminates the distance between them. Second, compared with 
SAE-I, rather than merging two tasks in a single channel 
SMCAE employs two channels to better clarify each task with 


^collected from UCI machine learning repository (HWDUCI) 0. 


^The parameters are cross-validated 























Fig. 6 . Rank-1 accuracy of different methods on CUFSF dataset. 


the aim of reconstructing the main ‘characters’ and ‘patterns’ 
co-existing in both tasks. Thus synthetic data can be more 
easily transformed to real data with less error. Finally, SMCAE 
is better than SAE-II as SMCAE learns features of real data 
in task (i:A^, o:X^)^. These features will better compensate 
the difference between synthetic data and real data during the 
transformation. 



Value of the parameter 

Fig. 7. Rank-1 accuracy by setting different value for 7 in Eq Rank-1 
accuracy by setting 7 equal to 0, 0.5, 1, 5, 10, 50, and 100 are shown in the 
figure. 

We further validate the results by using Rank-1 recognition 
accuracy which is also reported in ifT^ . 1^ . The results are 
shown in Fig. The methods of ms, are comparable to 
our SMCAE. Method IT^ employed a discriminant common 
subspace to maximize the between-class variations and mini¬ 
mize the within-class variations. Method (30) used a structure 
composed of two autoencoders. As can be seen Eig. the 
SMCAE outperforms all other methods. 

Parameter Validation in Eq. To validate the significance 
of in Eq. We set 7 with different values and report the 
rank-1 accuracy in Eig [7] Particularly, when 7 is 0, it takes 2 
times longer for SMCAE to converge compared with 7 = 50 
used in this work, Eurther with 7 = 0 the rank-1 accuracy is 
dropped by more than 2%. This validates the importance of 
term discussed in Sec. 3.2. 

Qualitative results. Some qualitative results are shown in Eig. 

It shows that a sketch HOG transformed by our SMCAE is 
more similar to the ground truth photo HOG. 

B. Handwritten Digit Recognition 

Generating synthetic data. A synthetic version of each real 
character is generated as a variant of a centralized model 
learned from real characters. The centralized model of digit 



is shaped by control points C = settled on the 

boundary of the digit. A technique called migration is used to 
locate corresponding control points on each real digit image. 
A synthetic digit image then could be generated by filling 
areas closed by the control points Examples of generated 
synthetic digits are shown in Eig.[^ To generate more synthetic 
data which is used to train the classifier once transformed by 
the trained SMCAE, we assume that locations of the control 
points follow a multivariate normal distribution C r\j 
with /i and E estimated using control points on the synthetic 
digit images. Eor each digit, 3, 000 new synthetic images are 
generated by randomly drawing samples from E). 
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Fig. 9. Illustration of real digit images (upper row) and corresponding 
synthetic versions (lower row). 


We compare our SMCAE with SMCAE-II, SAE-I, SAE- 
II, LeNet-5 ESI and the best results m reported on this data 
set. The classification performance is evaluated by El-score. A 
Support Vector Machine (SVM) classifier with RBE kernel is 
used in the experiments. Eor SMCAE, SMCAE-II, SAE-I and 
SAE-II in the test, real training data together with transformed 
synthetic data are used to train the SVM. 

As shown in Eig. (left), the SVM classifier with our SM¬ 
CAE is better than all the alternative methods. This validates 
the effectiveness of our framework in generating synthetic data 
to better help training a classifier. 

To further demonstrate how transformed synthetic data im¬ 
prove the classification results, we conducted more evaluations 
by training classifiers using different combinations of training 
sets in Eig. [T^ (right). Particularly, four combinations of 
training sets are used. Eirst, to have a performance baseline of 
SVM, we trained the SVM using real data only. To investigate 
how much improvement we could obtain in classification using 

^Please refer to supplementary material for details. 


























Classification results using different methods 



SMCAE+SVM SMCAE-II+SVM SAE-I+SVM SAE-II+SVM LeNet-5 [1] 

Classification results using different datasets 



Fig. 10. Comparison of classification results in Fl-score of different methods 
(left) and different training datasets (right). In the left figure, all the methods 
in the test used the same training dataset which combines original real data 
and transformed synthetic data. 


F1-score of increasing number of synthetic data in training 



Fig. 11. Fl-scores and corresponding standard deviation when increasing 
the number of synthetic data used in training is shown. 

a SVM trained by transformed synthetic data, we compare a 
SVM trained by synthetic data and transformed synthetic data 
respectively. The best performance is obtained with a SVM 
trained by real data together with transformed synthetic data. 

With more synthetic training data generated by SMCAE, 
we gain a large margin of improvement in the classification. 
We notice that we can get the same result (0.989) by using 
Transformed synthetic and Real+Transformed Synthetic sepa¬ 
rately in Fig. (right), which highlights the effectiveness of 
SMCAE in transforming synthetic data to simulate real data. 

Finally, it is interesting to evaluate how the amount of 
synthetic data affects the classification results. We increasingly 
add more transformed synthetic data (from 300 to 3,300 
samples) when training the SVM. The classification results 
are reported in Fig. [TT] The curve shows an ascending trend 
when adding more samples, which means that all transformed 
synthetic data added to this test are highly effective and useful 
in the classification. 

V. Conclusion 

In this paper we identify the synthetic gap problem. To 
solve this problem, we propose a novel Stacked Multichannel 


autoencoder (SMCAE) model. SMCAE has multiple channels 
in its structure and is an extension of a standard autoencoder. 
We show that SMCAE not only bridges the synthetic gap 
between real data and synthetic data, but also jointly learns 
from both real and synthetic data. 
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Supplemental Material 

VI. Optimization of SMCAE 

With two branches in the SMCAE, we target to minimize the 
reconstruction error of two tasks together while taking into 
account the balance between two branches. The new 
objective function is given as: 


E = J^{0,,0^) + J^{0,,0^)+^^ (6) 

where 

^=\{J^{0,,0^^)-J^{0e,0^)f (7) 

is a regularization added to balance the learning rate between 
two branches. In the SMCAE, with balance regularization 
added to the objective, the only difference as opposed to 
sparse autoencoder is the gradient computation of unknown 
parameters Oe and We clarify these differences in the 

following equations: 
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The exact form of gradients of Og and 0 ^, 0^ varies 
according to different sparsity regularization 0 used in the 
framework. 


VII. Generating Synthetic Data 

Synthetic data are created to highlight the potential useful 
pattern in real images. In the proposed approach, the 
synthetic data are represented as a parametric model of a set 
of control points and edges associated to these points in the 
images. Erom the control points, the synthetic images could 
be generated to simulate the real images in terms of having 
the same structure or a similar appearance. Initially, the 
control points are selected from a centralized prototype that 
generalize all images in the same class. Then the locations 
of the control points are iteratively optimized until 
convergence in order to minimize the distance between 
synthetic images generated by control points and the real 
image. We annotate the control points and edges associated 
to them as S = {C,E}, where C = is the set of the 

control points, and E = {(q, c^)}, 1 < i, j < n is the set of 


edges connecting control points. A generalized algorithm of 
getting the best matching synthetic image is provided in 
Algorithm 


Algorithm 1 Get Matching Synthetic Image. 

Input: 

• A real image U. 

• A set of control points S = {C,E} with all control points 
Ci E C set to their initial positions. 

• A prototype image V generated using the initial S. 

1: while S is not converged do 

2: S = OptimizeControlPoints(f/, V, S). 

3: Generate V using S. 

4: end while 

5: Generate synthetic image / using S. 

6: return /. 


A. Learning Synthetic Prototype from Data 

In hand written digit dataset used in this work, we learn a 
centralized prototype from given data. A digit prototype is 
generated for all images with the same digit. Congealing 
algorithm proposed in GU is employed in this step to 
produce the synthetic prototypes for digits. In congealing, 
the project transformations are applied to images to 
minimize a joint entropy. Thus the prototype is considered to 
be an average image of all images after congealing, shown in 

Fig.[ni 

Then control points are evenly sampled from the boundary 
detected from the prototype image. The control points needs 
to be mapped to each digit image in order to generate a 
synthetic image. To find this mapping we implement an 
approach that migrates the control points from the prototype 
images to destination image. 



Fig. 13. Illustration of control points on a digit image. 


This point migration algorithm is based on a series of 
intermediate images generated in between synthetic 
prototype and destination image. To generate the 
intermediate images, we binarize all the images and the 
distance transformed images |[9l of the synthetic prototype 
and the real image are generated. Given the number of steps, 
an intermediate image then is generated as a binarized image 
of linear interpolation between two distance transformed 
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Fig. 12. Illustration of average images of each digit after congealing. 


images. In each step, the control points are snapped to the 
closest boundary pixels of the intermediate image. The 
algorithm of OptimizeControlPoints(f/, V, S) in this situation 
is given in Algorithm we fix the number of steps to 5 in 
this algorithm. A step by step examples is given in Fig. [T^ 
A zoom in example showing how control points moved from 


one digit to another is shown in Fig. 15 


»I ■ 




Fig. 15. An example of migration of the control points from source image 
(blue) to destination image (red). 


Algorithm 2 OptimizeControlPoints(f/, V, S) 

Input: 

• A real image U. 

• A prototype of the synthetic image S = {C,E}. 

• A synthetic image V. 

1: steps = 10. 

2: Compute distance transform image of ( 7 , V as U\V'. 

3: for z = 1 to steps do 

4: I = (1 - -^)U'A -r^V'. 

5: /=Binarize(/). 

6: Update S by snapping to the closest boundary pixel on I. 

7: end for 

8: Set the status of S to be converged. 

9: return S. 


To generate more synthetic digit images, We assume the 
distribution of control points on each digit image follows a 
multivariant normal distribution that C ^ S) where /i 
and S are computed using existing control points. The 
visualization of the distribution of control points of each 


digit is then shown in Fig. 16 






















Fig. 14. Illustrations of the migration of control points and intermediate synthetic images generated using control points in each step. The distance transform 
images of the synthetic prototype and real images are shown as the left most and right most images respectively. 



Fig. 16. Illustration of distributions of control points on each digit image, where colors from blue to red are used to represent the probability density from 
low to high. 













